| Input (x) | Output (y) | Application |
|---|---|---|
| Home features | Price | Real Estate |
| Ad, user info | Click on an Ad? (0/1) | Online Advertising |
| Image | Object (1, …, 10) | Photo tagging |
| Image, Radar info | Position of other cars | Autonomous driving |
| Audio | Text transcript | Speech recognition |
| English | Chinese | Machine translation |
| Voice | Voice | Human computer conversation |
\[X=\left[\begin{array}{cccc} x_{1}^{(1)} & x_{1}^{(2)} & \dotsb & x_{1}^{(m)}\\ x_{2}^{(1)} & x_{2}^{(2)} & \dotsb & x_{2}^{(m)}\\ \vdots & \vdots & \vdots & \vdots\\ x_{n_{x}}^{(1)} & x_{n_{x}}^{(2)} & \dots & x_{n_{x}}^{(m)} \end{array}\right]\in\mathbb{R}^{n_{x}\times m}\]
\[y=[y^{(1)},y^{(2)},\dots,y^{(m)}] \in \mathbb{R}^{1 \times m}\]
\(\hat{y}^{(i)} = \sigma(w^Tx^{(i)} + b)\) where \(\sigma(z) = \frac{1}{1+e^{-z}}\)
\[X=\left[\begin{array}{cccc} x_{1}^{(1)} & x_{1}^{(2)} & \dotsb & x_{1}^{(m)}\\ x_{2}^{(1)} & x_{2}^{(2)} & \dotsb & x_{2}^{(m)}\\ \vdots & \vdots & \vdots & \vdots\\ x_{n_{x}}^{(1)} & x_{n_{x}}^{(2)} & \dots & x_{n_{x}}^{(m)} \end{array}\right]\in\mathbb{R}^{n_{x}\times m}\]
\[y=[y^{(1)},y^{(2)},\dots,y^{(m)}] \in \mathbb{R}^{1 \times m}\]
\(\hat{y}^{(i)} = \sigma(w^Tx^{(i)} + b)\) where \(\sigma(z) = \frac{1}{1+e^{-z}}\)
For logistic regression,
\[\underset{w,b}{min}J(w,b)= \frac{1}{m} \Sigma_{i=1}^{m}L(\hat{y}^{(i)}, y^{(i)}) + penalty\]
where
\[L_2\ penalty=\frac{\lambda}{2m}\parallel w \parallel_2^2 = \frac{\lambda}{2m}\Sigma_{i=1}^{n_x}w_i^2\] \[L_1\ penalty = \frac{\lambda}{m}\Sigma_{i=1}^{n_x}|w|\] For neural network,
\[J(w^{[1]},b^{[1]},\dots,w^{[L]},b^{[L]})=\frac{1}{m}\Sigma_{i=1}^{m}L(\hat{y}^{(i)},y^{(i)}) + \frac{\lambda}{2m}\Sigma_{l=1}^{L} \parallel w^{[l]} \parallel^2_F\] where \(\parallel w^{[l]} \parallel^2_F = \Sigma_{i=1}^{l}\Sigma_{j=1}^{l-1} (w^{[l]}_{ij})^2\)
Compile the NN model, define loss function, optimizer, and metrics to follow
Fit the NN model using the training dataset, define epoch, mini batch size, and validation size used in the training where the metrics will be checked
Predict using the fitted NN model using the testing dataset